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Predicting the Academic Performance of Graduate Students; 

A Review 

John Senger 
and 

Richard Elster 

Graduate schools everywhere are faced with the naggingly persistant res- 
ponsibility of selecting from a list of applicants those most likely to perform 
successfully in their programs. The process by which students are selected has 
developed into a modest science falling somewhere between running a longshoreman’s 
morning line-up and the choosing of astronauts. 

The problem has generated a literature which is reviewed here to determine 
what predictors and what criteria are used for graduate student selection and 
to evaluate the relative success of the predictors used. The bulk of the research 
involves correlation analysis, and it is hoped that the tabular presentation of 
these data will provide the reader a holistic impression of the varied findings. 

The article is organized into the following five segments: The Criterion 
Problem, Kinds of Predictors, Aptitude Variables as Predictors, Environmental 
Variables as Predictors, and Personality Variables as Predictors. 

The Criterion Problem 

The most frequently used criterion is graduate grade point average (GGPA) , 
probably because it is an easy one to use. Systems exist for grading student 
performances, gathering these data, and reducing them to one simple statistic — the 
grade average. There have been sufficient questions about grading systems and the 
concept of grades generally (cf. Newsweek, and Payne, 1968) to cast some doubt 
on this criterion however. When college grades themselves are cast in the role 
of a predictor their performance has been mediocre to poor (cf . Hoyt, 1965), In 
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some cases criteria similar to grade point average are used, > achievement 

examination scores or proportions of "A” grades. Other criteria for graduate 
student performance include: (1) success or failure in completing an academic 
program, , M.S. or Ph.D, (see references in Table III); (2) faculty ratings 

of students other than by grades (cf, Hilton, Kendall, and Sprecher, 1970); and 
(3) self ratings (cf. Hackman, et al . , 1970). However, the criterion most often 
encountered in the literature was the grade average. 

K inds of Predictors 

Measures of academic aptitude are some of the most often used predictors. 
These include the Graduate Record Examination Aptitude test which measures 
quantitative (GRE-Q) and verbal (GRE-V) aptitude. The Graduate Record 
Examination Advanced (GRE-A) tests examine knowledge in various academic dis- 
ciplines. The literature reveals The Miller Analogy Test (MAT) to be popular 
among psychology departments and schools of education. This paper will con- 
centrate on these test measures (GRE-V, GRE-Q, GRE-A, and MAT) and undergraduate 
'’grade point average (UGPA) , because their popularity makes possible a comparative 
analysis. UGPA may be said to represent an intelligence measure, but as a sample 
of past performance it also undoubtedly reflects motivational and other individual 
differences (Tyler, 1965, p. lOSff.). 

Other predictors we encountered in our review are myriad. Here are some: 

The number of courses taken in a specific discipline. 

The grades in specific courses. 

Written statements by the candidate. 

Letters of recommendation. 

Personal interviews. 

Biographical data (from birth order to age, to amount of laboratory 
experience, etc.) 
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Quality of undergraduate institution. 

Environments of undergraduate institutions. 

Personality and interest measures. 

Measures of motivation. 

We see here most of the sorts of predictors used in any selection process. 

The predictors reported in the open literature are predominantly individual 
intelligence measures. The results obtained using such measures will be presented 
in the next section of this paper. 

Aptitude Variables as Predictors 

By far the most popular method of analyzing the relationship between 
predictors and criteria is via correlational statistics, jB.jg,. , product-moment, 
bi-serial, and point bi-serial. 

Tables 1, II, and III summarize data from 31 analyses made during the 
past decade. The coefficients are presented without referring to their statis- 
tical significance. The reader should also be aware of the fact that not all 
of these correlations represent the results of crossvalidation efforts. 

The studies in the literature using multiple regression analysis are not 
reviewed here because the varying mix of predictors make the findings difficult 
to compare. Generally speaking, the inclusion of several pertinent variables 
in a multiple regression analysis can, of course, improve the correlation. 

Other studies using non-comparable analytical techniques e.g., discriminant 
function analysis, have not been included. Table I presents data from studies 
utilizing either graduate grade point average or achievement examination scores 
as the criterion. Table II shows studies using faculty ratings of students as 
criteria and in Table III completion of degree requirements is the criterion. 

All three tables present Graduate Record Examination — Verbal, Graduate Record 
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Graduate Record Examination-Advanced (where applicable, area indicated in "course of study" column). 
Undergraduate Grade Point Average ^See reference list for complete reference. 
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Examination-Quantitative, Graduate Record Examination-Advanced, the Millers 
Analogy Test, and Undergraduate Grade Point Average as predictors. Studies in 
academic areas such as psychology and education seem to have a disproportionately 
heavy representation in Tables I-III, due, no doubt, to the popularity of this 
sort of research in these disciplines. A huge proportion of the coefficients 
presented on the tables represent statistically significant relationships, 
however, it is not statistical significance, but, predictive significance that 
we wish to emphasize in this review. Individuals participating in the selection 
of students for graduate study, will, of course, want also to consider the 
magnitudes of the correlation coefficients in the context of their departments’ 
selection ratios and current baserates of student success. These factors play 
a major role in determining whether or not a correlation between a criterion 
and a predictor set is ’’large enough” (Taylor and Russell, and Abrahams, Alf and 
Wolfe) . 

Choosing, albeit arbitrarily, a correlation coefficient value of .35 
as indicating marginal predictive respectability we find in Table I that less 
than half of the 42 GRE-V coefficients equal or exceed that value. A higher 
standard, .50, yields only three correlations. The GRE-Q is even less inspiring. 
Thirteen of the forty-one coefficients exceed .35, and four are above .50. The 
same results exist for the GRE-Advanced data; slightly more than half the correlations 
exceed .35 and two are higher than .50. MAT results are at least as bleak, five of 
the six coefficients presented are below .35. 

In what appears to have been an exhaustive review of studies using the GREs, 
Willingham summarizes the predictive track records of the GREs, UGPA, and letters 
of recommendation. The validities of these measures in predicting criteria such 
as GGPA and overall faculty ratings are presented (via median validity coefficients) 
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as are their median validity coefficients in each of nine fields of graduate 
study. These median validity coefficients are nearly always less than .40, and 
have cl typical value of around .30. 

One might argue that aptitude measures should not be particularly good 
predictors of scholastic productivity, because student motivation may be such 
an important factor. Undergraduate grade average should reflect both intelligence 
and motivation and should, therefore, be a better predictor of graduate performance 
than aptitude measures alone. Right? Wrong! Table I shows only one study in 
which the correlation between GGPA and UGPA was above .35. None exceeded .50. 

These results do not produce great confidence in what one might otherwise 
believe to be useful predictors. Perhaps it is the criterion, graduate grades, 
that is responsible for the relatively low correlations. Tables II and III 
present alternative criteria - but with the same kind of indifferent results. 

Using completion of degree requirements as the standard, we find the correlations 
to be positive, but again at a very modest level. Of the 37 correlations with 
GRE-V, GRE-Q, GRE-A, MAT, and UGPA, only one exceeds .50, and it’s the same one that 
exceeds .35. 

When we look at the faculty evaluation criterion, only two (GRE-A and UGPA) 
of the 19 coefficients exceed .35, and none are over .50. Why do these predic- 
tors perform so modestly? 

Somewhat higher validity coefficients were found in studies carried out 
at the Naval Postgraduate School in 1966 and 1967. Here we find correlations 
between grades and GRE-V of .51, .44, and .43, and GRE-Q correlations are 
even higher, .73, .70, and .65. These higher correlations may be caused by 
such factors as adequate financial support of the students during their 
studies. They may also reflect a homogeneous level of motivation for older 
more career ensconced students. Another plausible explanation is statistical. 
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In nearly all the studies reported here, with the exception of the Naval 
Postgraduate School studies, the predictor data were used to select students 
for the graduate programs. Truncated samples undoubtedly resulted, with only 
the higher ends of the predictor distributions being represented. This being 
the case, the correlations would be lower than they would have been without 
the restriction of range on the predictors, (Thorndike, 1949, p. 170). 

Of course, the intellectual factors represented in this section are not 
the only ones which affect student performance. The low correlations may 
reflect the influence of variables other than intelligence. In the Predictors 
section several such factors were listed. An important amount of research has 
been performed in two of them, the college environment and individual person- 
ality. Though the major part of this research has focused on undergraduates, 
the findings might well be extrapolated to graduate level performance prediction. 

A discussion of some of these studies is included in the following two sections. 

Environmental Variables as Predictors 

Most of the research into college environment and institutional quality 
revolves around Astin’s examination of the subject during the 1960’s. Astin 
and Holland developed an Environmental Assessment Technique. This includes 
three sets of variables: (1) the six Holland (Holland, 1959) vocational classi- 
fications (Realistic, Intellectual, Social, Conventional, Enterprising and 
Artistic), (2) the size of the institution, and (3) the intelligence level of 
the student body. This latter an estimate derived from a sample of undergraduates 
entering 335 institutions. The estimate was based on their National Merit 
Scholarship scores (Astin and Holland, 1969, p. 308). Astin found the environ- 
mental variables Intellectual, Enterprising and Artistic to correlate positively 
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with student body intelligence level. The Realistic, Social and Conventional 
orientations correlated negatively. Astin found that the aspiration of talented 
students to obtain a Ph.D. was negatively affected by the size of the student 
body and the Conventional orientation (Astin, 1963). Institutions scoring high 
on these factors tended to emphasize sports and social activities at the 
expense of scholarship. He also found that student faculty relations were less 
effective and improvement in study habits was inhibited. In another study, Astin 
found that student achievement as measured by GRE advanced tests in the areas 
of social science, natural science and humanities, correlated poorly with 
traditional indices of institutional quality such as intellectual level of 
classmates, competitiveness or institutional affluence (Astin 1968). Institutions 
characterized by intelligent students, competitiveness and affluence do turn out 
students that perform better on a variety of accomplishment measures. This 
difference disappears, however, when a correction for individual differences 
in student ability is made. Freshman grade point averages were higher for 
students in the more selective schools, but when selectivity was taken into 
account, such factors as size, wealth, location, type of control (^.^« > state, 
private, religious) and curriculum appeared to make little difference in 
student performance (Astin, 1971, p. 27). 

Hood and Swanson (1965) using somewhat different methods characterized 
colleges in the state of Minnesota as agriculture, institute of technology, 
college of liberal arts (all in the University of Minnesota) , private liberal 
arts. Catholic male, Catholic female, state and junior. They were able to 
state that a student falling at the 50th percentile in the Minnesota Scholastic 
Aptitude Test would probably be expected to fail at the University Institute 
of Technology or College of Liberal Arts with a 1.4 or 1.6 (out of 4.0) average. 
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be placed on probation with a 1.9 at a typical private liberal arts college 
or Catholic men’s college, and to make a C+ average at a Catholic women’s college 
a junior college, or a state college. 

Hackman, Wiggins and Bass (1970) found that the ’’quality” of the undergraduate 
institution, as assessed by members of the University of Illinois Psychology 
Department faculty, correlated .30 with the student’s own assessment of his progress 
toward a Ph.D. , .31 with faculty judgment and .43 with his relative ’’success”^ six 
years out of school. 

It appears the conventional wisdom is correct: a B+ average at one institution 
may not reflect the same level of educational accomplishment as does a B+ average 
at another institution. Typically, a student could expect to obtain higher grades 
in a less selective college than he could in a more selective one, as grade dis- 
tributions at an institution tend to float with the relative abilities of the 
students in attendance (Hoyt and Munday, 1966). 

There is some indication that newer environmental assessment techniques 
which identify a school’s ’’personality” may provide insight into the performances 
of its graduates in advanced degree programs. In summary, it would seem that use 
of a measure of undergraduate institutions selectivity, like Astin’s seven point 
scale (Astin, 1971, p. 48), should help improve predictions of graduate study 
performance . 

Personality Variables as Predictors 

Personality factors would seem to account for some of the variance unexplained 
by academic aptitude measures and undergraduate grades. One variable would be 
the need for achievement. Projective testing of this motivation orientation has 
been carried out at Harvard by McClelland and his colleagues. Early results were 
spotty, varying between moderate positive to negative correlations between Themantic 

’’Success” ranged from failure to complete the doctorate to appointment to 
a position in a ’’highly prestigious” institution. 
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Apperception Test (TAT) need achievement scores and college grades (McClelland, 
Atkinson, Clark and Lowell, 1953). Personality inventory measurements of 
achievement motivation have fared little better. A study by Gough and Hall 
(1964) found no significant correlations between the two achievement scales 
on the California Personality Inventory and medical school GPA, but significant 
correlations were found for Sociability (.35) Tolerance (.34) and Intellectual 
efficiency (.40). In an unpublished study at the Naval Postgraduate School, 
Senger, Wyatt, and Knapp found a statistically significant correlation (.26) 
between CPI Achievement via Independence and also Intellectual Efficiency 
(.25) Psychological Mindedness (.24) and Flexibility (.17). 

The Achievement scale on the Edwards Personal Preference Schedule (EPPS) 
is another popular measure of this motivation, but, as in the case of the CPI, 
the literature relating it to academic performance seems surprisingly thin. 

At Carnegie Institute of Technology, Krug (1959) found significantly higher 
scores on the Achievement scale among academic over-achievers as compared to 
under-achievers. Gabhart and Hoyt (1958) found similar results at Kansas State. 
Both studies found the Need for Order discriminated between under and over 
achievers. At the Naval Postgraduate School an unpublished study by Golanka 
and Gilmore (1967) found under-achievers scoring significantly higher than 
over-achievers on Achievement and Order. Senger, Wyatt and Knapp (1969) 
found, in another study at the Naval Postgraduate School, a significant positive 
correlation (.23) between EPPS Achievement and graduate grade point average. 

In sum, the situation still seems as described in 1949 (Donahue, Coombs, and 
Travers) : motivational measures and grades tend to be only slightly inter- 

related. 

Studies relating Strong Vocational Interest Blank (SVIB) to academic 
performance are examples of attempts to relate scholarship to personal interests. 
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Hauntras, Lee and Hebahlran (1973) found the SVIB Academic Achievement (AACH) 
scale to correlate .37 with grades for 423 freshman. Johnson (1969) found 
correlations between AACH and GPA to be .17 for arts and science students and 
.02 for business administration students. In the original validation studies, 
Campbell and Johansson (1966) found a correlation of . 36 between AACH and GPA 
for their freshmen cross validation group. Lindsay and Althouse (1969) found 
an AACH-GPA correlation of .10 for male freshmen and of .25 for women. A .35 
correlation between AACH and GPA was found by Wagman (1971) for an undergraduate 
and graduate sample. 

Scores on the SVIB occupational scales are predictive of students’ tendencies 
to stick with a curriculum, but usually are not predictive of grades (Kellogg, 
1968) . The lack of a correlation between SVIB scores and grades may be another 
example of the impact of restriction of range ^._e. , self selection may have 
yield groups having homogeneous SVIB occupational scale scores. 

Summary and Conclusions 

Examination of the accompanying tables of correlations between intellectual 
and other measures and criteria of graduate student performance does not encourage 
one to increase his faith in the validity of the popular predictors. It must 
be stated, however, that the relationships are positive, and when we take into 
account that most are based upon truncated samples not including low scorers, 
validities of these measures may be better than the typical study makes them 
appear . 

The phenomenon of low predictor - criterion relationships may not be res- 
tricted to academic performance. Ghiselli (1966) did not find very high co- 
efficients (almost always less than +.30 with performance criteria) in his survey 
of the validity of occupational aptitude tests. Difficulties in predicting 
performance appear to be universal. It should be noted again that the studies 
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reviewed here were often of concurrent, rather than predictive, validity design. 

The correlations are, therefore, probably smaller than they would have been 
without restriction of range. It should be further cautioned that the tables 
include samples which have not been cross-validated, probably exaggerating the 
strengths of the relationships presented. 

Can measures of non- intellectual factors be used to improve predictions? 

It appears that the "quality" of the undergraduate institution may be useful 
in predicting graduate student performance. The underlying factor here may 
be the selectivity of the college, in any case this "environmental" input may 
be useful in interpreting the meaningfulness of the undergraduate grade point 
average. 

It would seem that standard measures of motivation and interest should be 
worthwhile supplemental predictors; unfortunately studies to date do not support 
this expectation. Investigations using the TAT, CPI and EPPS do not provide 
much data which would give one confidence in finding a useful measure of motivation. 
The investigations using these instruments show their predictive power is usually 
low. The SVIB offers little more; the few studies show neither the occupational 
scales nor the Academic Achievement scale offering strong relationships with 
academic performance. Perhaps Tyler (1965, p. 119) provided an explanation when 
she wrote: 



A conclusion suggested by this research and compatible with 
all the previous work in this and other settings is that the dif- 
ferences in motivation leading to differences in school achievement 
are not those that personality theorists, with their background in 
the clinic and the hospital, tend to think of first. They are not 
differences in basic drives but in learned habits of work. They 
are not differences in the degree to which negative qualities like 
anxiety and neurotic traits are present but rather the degree to 
which strong and well organized positive qualities such as interests, 
commitment, or enthusiasm about some line of endeavor characterize 
an individual. 
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The necessity to choose from among the applicants to graduate schools 
persists, however, and though the relationships between predictors and 
criteria are not particularly strong, they can be useful for decision making... 
if selection ratios are sufficiently small and prior base rates of success 
are auspicious. (Meehl and Rosen, 1955). Dawes (1971, p. 180) stated that 
top graduate departments were considering as many as 100 applicants for every 
graduate student selected for admission. With such a selection ratio, even 
predictors with low validities can be expected to be useful, (Taylor and Russell, 
1939, and Abrahams, Alf , and Wolfe, 1971), in decision-making. So, from, a 
decision-making point of view , the predictors of graduate school performance 
may be, in the current marketplace, good enough. 

Finally, the reader should be reminded that studies using multiple predictors 
simultaneously were not reviewed in this paper. Such studies, usually using 
multiple regression, often yield higher validity coefficients than are found 
when using a single predictor. 
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